Skip to content

feat(trigger): data class and helper functions for lambda trigger events #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Sep 18, 2020

Conversation

michaelbrewer
Copy link
Contributor

@michaelbrewer michaelbrewer commented Sep 7, 2020

Issue #, if available: #164

Description of changes:

Each Lambda trigger event there is a corresponding typed data class which doc strings extracted from the AWS documentation, some helper methods to updating values as well as decoding any embedded data. In the test suite there is alot a series of example trigger events.


Example screenshot of the docs:

Screen Shot 2020-09-10 at 12 15 59 PM

CloudWatchLogsEvent example usage:

from aws_lambda_powertools.utilities.data_classes import CloudWatchLogsEvent

def handler(event: Dict[str, Any], context: LambdaContext):
    # CloudWatchLogsEvent provides a typed data class with a method to decode gzipped log messages
    logs_data = CloudWatchLogsEvent(event).parse_logs_data()
    # CloudWatchLogsDecodedData is another typed data class with docstrings and getters
    for log_event in logs_data.log_events:
        handle_log_message(log_event.message)

PreTokenGenerationTriggerEvent example usage:

def handler(event: Dict[str, Any], context: LambdaContext) -> Dict[str, Any]:
    # PreTokenGenerationTriggerEvent data class includes docstrings and helper methods
    pre_token_event = PreTokenGenerationTriggerEvent(event)

    claims_override_details = pre_token_event.response.claims_override_details
    if pre_token_event.user_name == "SpecialUser":
        # ClaimsOverrideDetails has convenient setters
        claims_override_details.claims_to_suppress = ["Email"]
        claims_override_details.claims_to_add_or_override = {"attribute": "value"}

    if "SpecialGroup" in pre_token_event.request.group_configuration.groups_to_override:
        # A setter method for only part of groupOverrideDetails
        claims_override_details.set_group_configuration_preferred_role("preferred_value")

    # Updates are made to the original dict
    return event

S3Event example usage:

from aws_lambda_powertools.utilities.data_classes import S3Event

def handler(event: Dict[str, Any], context: LambdaContext) -> Dict[str, Any]:
    # S3Event data class includes docstrings and helper methods
    s3_event = S3Event(event)

    # Convenience methods for popular attributes
    bucket_name = s3_event.bucket_name
    key = s3_event.object_key

    # Handle S3 event notification
    handle_record(bucket_name, key)

    return {"statusCode": 200, "body": f"Successfully processed {bucket_name}/{key}"}

APIGatewayProxyEvent example (V1 and V2 are supported):

from aws_lambda_powertools.utilities.data_classes import APIGatewayProxyEventV2

def handler(event: Dict[str, Any], context: LambdaContext) -> Dict[str, Any]:
    event = APIGatewayProxyEventV2(event)

    # Comprehensive property access with docs
    claim_value = event.request_context.authorizer.jwt_claim.get("claim_key")
    # Query helper with optional default_value if no parameter is found
    qs_value = event.get_query_string_value("param", default_value="Default value")
    # Header helper with optional default_value
    header_value = event.get_header_value("name")
    print(f"Handle event with claim_value: {claim_value}, qs_value: {qs_value}, header_value: {header_value}")

    return {"statusCode": 200, "body": "Success!"}

KinesisStreamEvent example with decoding data:

from aws_lambda_powertools.utilities.data_classes import KinesisStreamEvent

def handler(event: Dict[str, Any], context):
    kinesis_event = KinesisStreamEvent(event)
    for record in kinesis_event.records:
        # Kinesis data is base64 encoded so decode and load as json
        payload: dict = record.kinesis.data_as_json()
        print(f"Decoded payload: {payload}")

Checklist

  • Meet tenets criteria
  • Update tests
  • Update docs
  • PR title follows conventional commit semantics
  • API Gateway Poxy V1 and V2 events
  • Application Load Balance events
  • CloudWatch log event with decoding of the gzipped logs
  • Cognito user pool events
  • DynamoDB stream events
  • Event bridge
  • Kinesis stream events
  • S3 event notifications
  • SES events
  • SNS events
  • SQS events

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2020

Codecov Report

Merging #159 into develop will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #159     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           33        46     +13     
  Lines          918      2070   +1152     
  Branches        77        90     +13     
===========================================
+ Hits           918      2070   +1152     
Impacted Files Coverage Δ
...mbda_powertools/utilities/data_classes/__init__.py 100.00% <100.00%> (ø)
...bda_powertools/utilities/data_classes/alb_event.py 100.00% <100.00%> (ø)
.../utilities/data_classes/api_gateway_proxy_event.py 100.00% <100.00%> (ø)
...s/utilities/data_classes/cloud_watch_logs_event.py 100.00% <100.00%> (ø)
.../utilities/data_classes/cognito_user_pool_event.py 100.00% <100.00%> (ø)
...lambda_powertools/utilities/data_classes/common.py 100.00% <100.00%> (ø)
...s/utilities/data_classes/dynamo_db_stream_event.py 100.00% <100.00%> (ø)
...tools/utilities/data_classes/event_bridge_event.py 100.00% <100.00%> (ø)
...ols/utilities/data_classes/kinesis_stream_event.py 100.00% <100.00%> (ø)
...mbda_powertools/utilities/data_classes/s3_event.py 100.00% <100.00%> (ø)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 063d8cd...e860731. Read the comment docs.

Michael Brewer added 10 commits September 7, 2020 08:36
Tracked down some of the AWS docs to inline with the dict wrapper
classes.
For cases when the dict name conflicts with a python builtin we should
use a consistent method name
Add support for UserMigrationTriggerEvent and include better docstrings
Add support for CustomMessageTriggerEvent and
PreAuthenticationTriggerEvent
Add support for PreTokenGenerationTriggerEvent and
PostAuthenticationTriggerEvent
@michaelbrewer michaelbrewer marked this pull request as ready for review September 9, 2020 10:42
Changes:
* Add some missing docs pull for various AWS Docs
* Fix SQS mapping
* Make docs consistent
@jplock
Copy link

jplock commented Sep 10, 2020

API Gateway events?

@michaelbrewer
Copy link
Contributor Author

API Gateway events?

Sure. I can add that.

@michaelbrewer
Copy link
Contributor Author

API Gateway events?

@jplock - i added support for api gateway proxy events.

@michaelbrewer michaelbrewer changed the title feat(trigger): class wrapper for events feat(trigger): data class and helper functions for lambda trigger events Sep 10, 2020
Michael Brewer added 2 commits September 12, 2020 11:34
For both SNS and S3 event notifications we actually include a single
records, so we can have conveince metods to access popular resources

class CloudWatchLogsLogEvent:
def __init__(self, log_event: dict):
self._val = log_event
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you assign the event, it's a good idea to check first if the event has the key/values that you expect and if the event version is the version that you can handle.
Current event formats: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-and-eventbridge.html

Sample event format:

{
    "version": "0",
    "id": "c4c1c1c9-6542-e61b-6ef0-8c4d36933a92",
    "detail-type": "CloudWatch Alarm State Change",
    "source": "aws.cloudwatch",
    "account": "123456789012",
    "time": "2019-10-02T17:04:40Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:cloudwatch:us-east-1:123456789012:alarm:ServerCpuTooHigh"
    ],
    "detail": {
        "alarmName": "ServerCpuTooHigh",
        "configuration": {
            "description": "Goes into alarm when server CPU utilization is too high!",
            "metrics": [
                {
                    "id": "30b6c6b2-a864-43a2-4877-c09a1afc3b87",
                    "metricStat": {
                        "metric": {
                            "dimensions": {
                                "InstanceId": "i-12345678901234567"
                            },
                            "name": "CPUUtilization",
                            "namespace": "AWS/EC2"
                        },
                        "period": 300,
                        "stat": "Average"
                    },
                    "returnData": true
                }
            ]
        },
        "previousState": {
            "reason": "Threshold Crossed: 1 out of the last 1 datapoints [0.0666851903306472 (01/10/19 13:46:00)] was not greater than the threshold (50.0) (minimum 1 datapoint for ALARM -> OK transition).",
            "reasonData": "{\"version\":\"1.0\",\"queryDate\":\"2019-10-01T13:56:40.985+0000\",\"startDate\":\"2019-10-01T13:46:00.000+0000\",\"statistic\":\"Average\",\"period\":300,\"recentDatapoints\":[0.0666851903306472],\"threshold\":50.0}",
            "timestamp": "2019-10-01T13:56:40.987+0000",
            "value": "OK"
        },
        "state": {
            "reason": "Threshold Crossed: 1 out of the last 1 datapoints [99.50160229693434 (02/10/19 16:59:00)] was greater than the threshold (50.0) (minimum 1 datapoint for OK -> ALARM transition).",
            "reasonData": "{\"version\":\"1.0\",\"queryDate\":\"2019-10-02T17:04:40.985+0000\",\"startDate\":\"2019-10-02T16:59:00.000+0000\",\"statistic\":\"Average\",\"period\":300,\"recentDatapoints\":[99.50160229693434],\"threshold\":50.0}",
            "timestamp": "2019-10-02T17:04:40.989+0000",
            "value": "ALARM"
        }
    }
}

Keys that can be checked before we initialize the object:

  • version
  • source

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koiker - I did not know cloud watch log events had any such variations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koiker Those docs are for cloudwatch events sent to EventBridge. This class is handling cloudwatch log events and the encoded data.

Use new DictWrapper abstract class for the data classes that wrap an
event Dict
@michaelbrewer
Copy link
Contributor Author

michaelbrewer commented Sep 13, 2020

@heitorlessa - would it help to include helper methods which help with common tasks

eg: For SES events you might want to fetch the content from S3 by messageId and parse out the text/plain or text/html elements

import email
import quopri


# Configured S3 bucket name
s3_bucket = "..."
# Configured Object key prefix
key_prefix = "..."


def get_message_contents(message_id: str, content_type: str = "text/html") -> str:
    """ Load the email from S3 and then search through the
    message_contents for the by content type

    :param message_id: Is used for the key to look up the message on S3
    :param content_type: Content type of the payload to return defaults to `text/html`
    :return: Returns the email contents matching the specified content_type
    """
    try:
        obj = s3.Object(s3_bucket, key_prefix + message_id)
        message_contents = obj.get()["Body"].read().decode("utf-8")
    except ClientError as e:
        raise Exception("S3 ClientError:", e.response["Error"]["Message"])
    else:
        email_message = email.message_from_string(message_contents)
        if email_message.is_multipart():
            # Get matching content_type payload
            return next(
                (
                    quopri.decodestring(part.get_payload()).decode()
                    for part in email_message.walk()
                    if content_type in part.get_content_type()
                ),
                None,
            )
        else:
            # Fail over to payload
            return email_message.get_payload()

Add support for Kinesis stream events with a helper method to decode
data
@heitorlessa
Copy link
Contributor

@cakepietoast @nmoutschen

could you do a full review here sometime this week?

I'll have a quick look tomorrow but would appreciate some fresh eyes here while I can't dedicate some proper time to this yet.

PS: I'm raising a feature request for Lambda to publish a schema for all event sources so we could auto-generate most of it in the future (including for JS).

Copy link
Contributor

@heitorlessa heitorlessa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, a BIG thank you @michaelbrewer -- It's a lot of work and thought that you put on here, we deeply appreciate it.

Small comment for now on data as json/decoded for CW Logs.

I need some time to think on the design - I don't want to rush, so give me a few days to pull this and play with it some more.

At a first glance, I'd like to see us having properties for the absolutely minimal necessary, where the additional docstring and getters pay off in terms of maintenance, user experience, and adding a new event source.

From the top of my head, I see a big value in having getters for

  1. Nested keys - It's painful to remember more than one key only to fetch X data
  2. Data transformation like you did on Kinesis b64, CloudWatch Logs b64+gzip+optionally json, Dynamo etc.
  3. Ambiguous properties that would benefit from additional explanations since their names or values can be easily misinterpreted

def decode_cloud_watch_logs_data(self) -> CloudWatchLogsDecodedData:
"""Decode, unzip and parse json data"""
payload = base64.b64decode(self.aws_logs_data)
decoded: dict = json.loads(zlib.decompress(payload, zlib.MAX_WBITS | 32).decode("UTF-8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we either make JSON deserialisation as another property, or optionally off?

Since this is a data class utility, it's possible they'll have non-JSON data too.

I'm fine having it on by default, though -- This will likely be true for Kinesis and API GW binary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heitorlessa ok i split it up.

@heitorlessa
Copy link
Contributor

Firstly, a BIG thank you @michaelbrewer -- It's a lot of work and thought that you put on here, we deeply appreciate it.

Small comment for now on data as json/decoded for CW Logs.

I need some time to think on the design - I don't want to rush, so give me a few days to pull this and play with it some more.

At a first glance, I'd like to see us having properties for the absolutely minimal necessary, where the additional docstring and getters pay off in terms of maintenance, user experience, and adding a new event source.

From the top of my head, I see a big value in having getters for

  1. Nested keys - It's painful to remember more than one key only to fetch X data
  2. Data transformation like you did on Kinesis b64, CloudWatch Logs b64+gzip+optionally json, Dynamo etc.
  3. Ambiguous properties that would benefit from additional explanations since their names or values can be easily misinterpreted

One extra bit... if, however, the point is to expose all keys regardless for auto completion then it's simply a trade off we'll have to agree on as we add more event sources.

This would make sense if the end goal is to make this a separate package that other projects could benefit from.

@heitorlessa
Copy link
Contributor

@heitorlessa - would it help to include helper methods which help with common tasks

eg: For SES events you might want to fetch the content from S3 by messageId and parse out the text/plain or text/html elements

import email

import quopri





# Configured S3 bucket name

s3_bucket = "..."

# Configured Object key prefix

key_prefix = "..."





def get_message_contents(message_id: str, content_type: str = "text/html") -> str:

    """ Load the email from S3 and then search through the

    message_contents for the by content type



    :param message_id: Is used for the key to look up the message on S3

    :param content_type: Content type of the payload to return defaults to `text/html`

    :return: Returns the email contents matching the specified content_type

    """

    try:

        obj = s3.Object(s3_bucket, key_prefix + message_id)

        message_contents = obj.get()["Body"].read().decode("utf-8")

    except ClientError as e:

        raise Exception("S3 ClientError:", e.response["Error"]["Message"])

    else:

        email_message = email.message_from_string(message_contents)

        if email_message.is_multipart():

            # Get matching content_type payload

            return next(

                (

                    quopri.decodestring(part.get_payload()).decode()

                    for part in email_message.walk()

                    if content_type in part.get_content_type()

                ),

                None,

            )

        else:

            # Fail over to payload

            return email_message.get_payload()

Not yet - I'd like to see more demand on these beforehand.

These (SES example) would also require IAM permissions, so there's more at play

@michaelbrewer
Copy link
Contributor Author

@heitorlessa maybe once we have the schema it would replace the @Property access methods. With some very basic logic to generate python safe names from the schema. But I would like to keep that could in place with some warning that is might change later.

For getters and decoding it can just be for a small subset of frequently used elements. Like query string, headers, path and body decoding for proxies api calls.

@michaelbrewer
Copy link
Contributor Author

@heitorlessa - would it help to include helper methods which help with common tasks

eg: For SES events you might want to fetch the content from S3 by messageId and parse out the text/plain or text/html elements

import email

import quopri





# Configured S3 bucket name

s3_bucket = "..."

# Configured Object key prefix

key_prefix = "..."





def get_message_contents(message_id: str, content_type: str = "text/html") -> str:

    """ Load the email from S3 and then search through the

    message_contents for the by content type



    :param message_id: Is used for the key to look up the message on S3

    :param content_type: Content type of the payload to return defaults to `text/html`

    :return: Returns the email contents matching the specified content_type

    """

    try:

        obj = s3.Object(s3_bucket, key_prefix + message_id)

        message_contents = obj.get()["Body"].read().decode("utf-8")

    except ClientError as e:

        raise Exception("S3 ClientError:", e.response["Error"]["Message"])

    else:

        email_message = email.message_from_string(message_contents)

        if email_message.is_multipart():

            # Get matching content_type payload

            return next(

                (

                    quopri.decodestring(part.get_payload()).decode()

                    for part in email_message.walk()

                    if content_type in part.get_content_type()

                ),

                None,

            )

        else:

            # Fail over to payload

            return email_message.get_payload()

Not yet - I'd like to see more demand on these beforehand.

These (SES example) would also require IAM permissions, so there's more at play

Maybe for this we just have a helper method when you can provide the contents of the s3 object.

Michael Brewer added 2 commits September 13, 2020 21:19
Create BaseProxyEvent for some reusable code for the http proxy events
Add support for ALB events
For handling CloudWatchLogsEvent log data split the Decode and
decompress from the parse as CloudWatchLogsDecodedData
@michaelbrewer
Copy link
Contributor Author

API Gateway events?

I also included ALB events and there is a BaseProxyEvent class that has the shared code between all three.

@lgtm-com
Copy link

lgtm-com bot commented Sep 14, 2020

This pull request introduces 1 alert when merging a746ddc into 063d8cd - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

@heitorlessa
Copy link
Contributor

@heitorlessa - would it help to include helper methods which help with common tasks

eg: For SES events you might want to fetch the content from S3 by messageId and parse out the text/plain or text/html elements

import email

import quopri





# Configured S3 bucket name

s3_bucket = "..."

# Configured Object key prefix

key_prefix = "..."





def get_message_contents(message_id: str, content_type: str = "text/html") -> str:

    """ Load the email from S3 and then search through the

    message_contents for the by content type



    :param message_id: Is used for the key to look up the message on S3

    :param content_type: Content type of the payload to return defaults to `text/html`

    :return: Returns the email contents matching the specified content_type

    """

    try:

        obj = s3.Object(s3_bucket, key_prefix + message_id)

        message_contents = obj.get()["Body"].read().decode("utf-8")

    except ClientError as e:

        raise Exception("S3 ClientError:", e.response["Error"]["Message"])

    else:

        email_message = email.message_from_string(message_contents)

        if email_message.is_multipart():

            # Get matching content_type payload

            return next(

                (

                    quopri.decodestring(part.get_payload()).decode()

                    for part in email_message.walk()

                    if content_type in part.get_content_type()

                ),

                None,

            )

        else:

            # Fail over to payload

            return email_message.get_payload()

Not yet - I'd like to see more demand on these beforehand.

These (SES example) would also require IAM permissions, so there's more at play

Maybe for this we just have a helper method when you can provide the contents of the s3 object.

Sure, that would be fine.

@lgtm-com
Copy link

lgtm-com bot commented Sep 14, 2020

This pull request introduces 1 alert when merging 7b4ec44 into 063d8cd - view on LGTM.com

new alerts:

  • 1 for __eq__ not overridden when adding attributes

Michael Brewer added 2 commits September 13, 2020 23:52
We don't need to include `_decompressed_logs_data` and `_json_logs_data`
in equality tests
Copy link
Contributor

@to-mc to-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets change the "trigger" naming convention to something more descriptive. My suggestion is data_classes so instead of from aws_lambda_powertools.utilities.trigger import DynamoDBStreamEvent we can do from aws_lambda_powertools.utilities.data_classes import DynamoDBStreamEvent

@michaelbrewer
Copy link
Contributor Author

Lets change the "trigger" naming convention to something more descriptive. My suggestion is data_classes so instead of from aws_lambda_powertools.utilities.trigger import DynamoDBStreamEvent we can do from aws_lambda_powertools.utilities.data_classes import DynamoDBStreamEvent

@cakepietoast - I just choose this name as it was based on what I saw in other libraries. More accurately it is the data classes for lambda trigger events.

I will update this and make the other requested changes.

@to-mc to-mc added area/utilities feature New feature or functionality labels Sep 18, 2020
Copy link
Contributor

@to-mc to-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, just need to update the documentation before we release it. I'd be glad to contribute the docs piece, let me know @michaelbrewer.

@heitorlessa
Copy link
Contributor

Let's create another PR for the docs - Implementation is solid, @cakepietoast is also happy with the direction.

Feel free to merge @cakepietoast when you can

As always, huge thanks @michaelbrewer for working on this.

@to-mc to-mc merged commit b654ca6 into aws-powertools:develop Sep 18, 2020
@michaelbrewer michaelbrewer deleted the feat-event-wrapper branch February 22, 2021 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants